Classification-Predictive Model Based on Artificial Neural Network Validated by Histopathology and Direct Immunofluorescence for the Diagnosis of Oral Lichen Planus

The diagnosis of oral lichen planus (OLP) poses many challenges due to its nonspecific clinical symptoms and histopathological features. Therefore, the diagnostic process should include a thorough clinical history, immunological tests, and histopathology. Our study aimed to enhance the diagnostic accuracy of OLP by integrating direct immunofluorescence (DIF) results with clinical data to develop a multivariate predictive model based on the Artificial Neural Network. Eighty patients were assessed using DIF for various markers (immunoglobulins of classes G, A, and M; complement 3; fibrinogen type 1 and 2) and clinical characteristics such as age, gender, and lesion location. Statistical analysis was performed using machine learning techniques in Statistica 13. The following variables were assessed: gender, age on the day of lesion onset, results of direct immunofluorescence, location of white patches, locations of erosions, treatment history, medications and dietary supplement intake, dental status, smoking status, flossing, and using mouthwash. Four statistically significant variables were selected for machine learning after the initial assessment. The final predictive model, based on neural networks, achieved 85% in the testing sample and 71% accuracy in the validation sample. Significant predictors included stress at onset, white patches under the tongue, and erosions on the mandibular gingiva. In conclusion, while the model shows promise, larger datasets and more comprehensive variables are needed to improve diagnostic accuracy for OLP, highlighting the need for further research and collaborative data collection efforts.


Introduction
Mouth ulcers, which affect around 25% of the adult population with their peak onset occurring between 10 and 29 years of age [1,2], comprise an umbrella term that can have many different causes including mechanical injuries (such as rough foods, brushing teeth, and dental prosthetics); viral, bacterial, or fungal infections [3][4][5]; genetic factors [6], mainly related to systemic inflammatory diseases (such as Crohn's disease [7], systemic lupus erythematosus [8], and Behçet's disease [9]); locally acting irritating chemical agents (such as using whitening toothpaste, excessive alcohol consumption, and smoking) [10]; and allergic/adverse reactions to food, drugs [11], or dental materials.Lichen planus is also one of the possible causes of this problem.This mucocutaneous condition affects the oral cavity.The disease is prevalent in 1% of the population [12], being most commonly observed in females aged between 30 and 70 years.To diagnose suspected ulcerative lichen planus, a biopsy is essential due to the 0.44-2.28%risk of malignant lesion occurrence [13].In lichen, a dense subepithelial lymphocytic band is observed on hematoxylin-eosin staining.The epithelium is keratotic with basilar degeneration and the presence of Civatte bodies (degenerating keratinocytes).A sawtooth appearance of the rete ridges may be present.However, lesions do not always yield a specific image on standard microscopy, so lichen cannot always be unequivocally confirmed or excluded on histopathological examination.Lesions that resemble lichen are called lichenoid lesions.
Oral lichen planus is a chronic inflammatory and autoimmune disorder.Although the exact cause of OLP is still not fully understood, it is thought to result from a disturbance in cell-mediated immune function due to a combination of genetic and environmental factors [14].While indirect immunofluorescence and immunohistochemistry are not useful in OLP diagnosis [15,16], direct immunofluorescence (DIF) is considered a helpful tool for the differential diagnosis of oral lichen planus and oral lichenoid lesions [17].Direct immunofluorescence is primarily used to diagnose blistering diseases such as pemphigus and pemphigoids in erosive lesions, particularly desquamative gingivitis [18].In lichen planus, a shaggy deposition of fibrinogen and complement is observed along the basement membrane zone, with no immunoglobulin present other than in colloid bodies [15].However, fibrinogen deposition is not specific to OLP as other oral potentially malignant disorders (OPMDs) can exhibit a similar pattern [15].Moreover, deposits of fibrin at the basement membrane zone and IgM-positive cytoid bodies similar to those in OLP may be present also in oral lichenoid drug reactions [15].
In the absence of new diagnostic tools, algorithms designed to shorten the diagnostic process are particularly important.Diagnostic criteria for OLP were published in 1978 by the WHO; in 2003, they were modified by van der Meij and van der Waal [19], and in 2016, by the American Academy of Oral and Maxillofacial Pathology.However, these official guidelines do not fulfill informational needs; for these reasons, various authors have undertaken the creation of diagnostic schemes.In 2019, Bilodeau and Lalla introduced a diagnostic algorithm for oral lesions, which was based solely on clinical symptoms [20].The following year, Holla et al. [21] published a diagnostic algorithm for oral mucosal blistering diseases derived from histopathological and immunological findings but did not incorporate clinical symptoms.Also in 2019, Rashid et al. [22] published a diagnostic pathway flowchart for patients with oral blistering, focusing on mucous membrane pemphigoid, pemphigus vulgaris, and paraneoplastic pemphigus.Another 2019 publication [23] was characterized by a cautious approach, avoiding definitive statements and merely indicating the likelihood of specific diagnoses without providing numerical estimates.This publication advised referring patients to specialized hospitals and considering various diagnostic tests.The conservative nature of this tool, while reflective of the authors' careful methodology, limits its usefulness for clinical decision making.
We are witnessing an explosion of interest in the application of artificial intelligence (AI) in many fields including medicine.Machine learning (ML), a branch of AI, focuses on making predictions by identifying patterns within data.A specialized subset of ML, deep learning, uses multilayered neural network algorithms modeled after the human brain's complex structure to make predictions.ML processes training data to identify distinctive characteristics in medical records or images, subsequently classifying them into various disease categories.Neural network learning can be either supervised, where correct answers are provided, or unsupervised, where the network clusters objects based on their similarities.The performance reliability of ML is evaluated by validating these acquired features with separate validation data and further confirming through testing with a dedicated dataset [24].In this article, as in the Statistica 13 software (Polish version), we refer to completing training as 'testing' and the assessment of classification accuracy as 'validation', which is the opposite of the usual practice in the literature of the field.
In our view, none of the diagrams published until now were sufficiently comprehensive to effectively support clinicians.Therefore, we intended to combine medical history and DIF to create a new multivariate predictive model.For this purpose, we used a semifinished product such as Statistica as none of the authors are programmers.However, the involvement of an AI specialist comes at a very high cost, so the purpose of this work was to make a preliminary screening assessment, using an off-the-shelf tool, of whether the integration of history, histological, and immunological data using a neural network is a promising avenue for the further development of AI-based diagnostic tools for the diagnosis of lichen.

Basic Statistics
Statistical analysis was performed using Statistica 13.Qualitative variables were presented as numbers and percentages; for the continuous variable 'age', the median, interquartile range, and range were given.The relationship between the assessed qualitative variables was assessed using the chi-square method, taking 0.05 as the level of significance.The continuous variable 'age' between groups was compared using the Mann-Whitney U test.

Variants of Classification
Confirmation of lichen via histopathology is very difficult.Many results are falsenegative in histopathology despite clinical signs indicative of lichen.All patients who had a biopsy had clinical symptoms indicative of lichen as suspected lichen was an inclusion criterion.According to diagnostic standards, lichen can be diagnosed if the histopathology result does not exclude such a diagnosis and the clinical picture is consistent with the disease.Thus, we conducted the analysis in two variants.In the first interpretation, a very narrow one, only patients with lichen confirmed in HP were classified as having lichen.In the second interpretation, standard for clinical practice, lichen was present in all patients in whom it was not excluded.Neural networks were created only for the variable "lichen not excluded in histopathology".

Artificial Neural Networks
Automatic neural networks were used as an alternative to multivariate analysis as it was not possible to perform logistic regression due to the anomalies of the maximum likelihood estimator in this dataset.Due to the redundancy of variables relative to cases, the Data Mining module of Statistica was used to select an appropriate number of variables (in a ratio of 1:10 relative to cases, i.e., 8 variables) that could be entered into the machine learning model in which predictors' significance was calculated by ranking the p-values for each predictor effect (for related p-values, the rankings were based on the ranking of the F).The number of confirmed cases of lichen in the study sample prevented the use of machine learning methods to find predictors for the variable lichen confirmed (lichen features on histopathology) as this would have risked overfitting the model after splitting it into a learning, test, and validation sample.Instead, such an analysis was performed for lichen features present or not excluded in histopathology.We used the default parameters of the Statistica 13 software (200 epochs; random, Gaussian initialization; stopping condition: 0.0000001; multilayer perceptron; cross validation).The following variables were assessed: gender, age on the day of lesion onset, DIF IgG, DIF IgA, DIF IgM, DIF C3, DIF F1, DIF F2, lip lesions, nail lesions, stress during the study period, stress at onset, genital symptoms, erosions on palate, erosions on buccal mucosa (right side/left side), erosions on tongue, erosions under tongue, erosions on maxillary gingiva, erosions on mandibular gingiva, erosions on upper lip, erosions on lower lip, white patches on palate, erosions on buccal mucosa (right side/left side), white patches on tongue, white patches under tongue, white patches on maxillary gingiva, white patches on mandibular gingiva, white patches on lower lip, white patches on upper lip, whether patient was previously treated by a dermatologist, whether patient was previously treated by a dentist, whether patient was previously treated by a general practitioner, any previous treatment, taking supplements, taking herbs, taking any medication, dental status, smoking, flossing, and using mouthwash.Four statistically significant variables were selected for machine learning.

Study Population
The study group consisted of 80 patients: 63 (78.8%) women and 17 (21.2%)men.Lichen was confirmed by histopathology in four (5.0%) of the study participants and not confirmed in fifty-seven (71.2%); it was not excluded in thirty subjects (37.5%) and excluded in thirty-one (38.8%).The remaining values represent missing data (non-diagnostic histopathology).The characteristics of the patients participating in the study are shown in Table 1.

Direct Immunofluorescence vs. Histopathology
The incidences of DIF IgG, DIF IgA, DIF IgM, DIF C3, DIF F1, and DIF F2 positivity did not differ significantly between either subjects with confirmed or unconfirmed lichen or between subjects with lichen excluded or not excluded (Table 2).DIF IgG, DIF IgM, and DIF IgA did not show associations with DIF F1 or DIF F2 scores.
Immunological patterns are shown in Table 3.Most of the patients (n = 37, 61.7%) were negative for all tested markers.Among four patients with lichen confirmed, all were negative, whereas among fifty-six patients with lichen not definitely confirmed, only thirty-three (58.9%) were negative.Among twenty-nine patients with lichen not excluded, fifteen (51.7%) were negative, and fourteen (48.3%) had another pattern, whereas among thirty-one patients with lichen excluded, twenty-two (70.1%) were negative and nine (29%) had another pattern.However, these differences were not significant (p = 103 and p = 126, respectively).The immunological pattern was not available for two patients.
There was a correlation between DIF F1 and DIF F2 scores and DIF C3 scores (Table 4).In the overall group, a positive DIF F1 score increased the probability of a positive DIF C3+ score by four times (OR 4.080, 95% CI: 1.0909 to 15.2594, p = 0.0367) while a positive DIF F2 score increased this probability by almost five times (OR 4.892, 95% CI: 1.2902 to 18.5514, p = 0.0196).p-values marked with red are statistically significant (p < 0.05).

Artificial Neural Network
The variables suggested by the Data Mining module as being relatively best for creating a predictive model are shown in Table 5.
Table 5. Variables proposed to be entered into the analysis by the Data Mining module for the variable 'lichen planus not excluded by histopathology'.For lichen confirmed, all variables were not significant.Machine learning resulted in five networks with very similar parameters (Table 6).MLP: multilayer perceptron.The code in the first column represents the number of neurons in the input layer-the number of neurons in the hidden layer-and the number of neurons in the output layer.

Variable
The models created in this way achieved approximately 71% correct classifications in the validation sample.The ROC curves for the models are shown in Figure 1.

Discussion
Our study confirmed previous observations regarding the weak association between the localization of lesions and the diagnosis of lichen.In this group of patients, erosions or white patches under the tongue excluded this diagnosis, and erosions on the mandibular gingiva significantly decreased the risk of having a lichen lesion but did not exclude it.Meanwhile, in the study conducted by Keller, 41.6% of 79 patients with OLP had lesions

Discussion
Our study confirmed previous observations regarding the weak association between the localization of lesions and the diagnosis of lichen.In this group of patients, erosions or white patches under the tongue excluded this diagnosis, and erosions on the mandibular gingiva significantly decreased the risk of having a lichen lesion but did not exclude it.Meanwhile, in the study conducted by Keller, 41.6% of 79 patients with OLP had lesions on the gingiva; for the rest of the cases, the gingival lesions were connected to other areas, particularly the buccal mucosa and tongue [25].Another study described lesions diagnosed as OLP mainly on the buccal mucosa (93.9%) followed by the gingiva (59.7%), mucobuccal fold (26.8%), tongue (26.8%), palate (7.3%), and vermilion border (7.3%) [18].These differences suggest that the localization of lesions is significant but still unclear and cannot be an independent criterion for diagnosis.
In the diagnosis of lichen planus, DIF is used to detect the presence of anti-keratinocyte antibodies in the patient's tissue.These antibodies are directed against keratin antigens, which are found on the surfaces of keratinocytes and the basement membrane of the skin.Positive DIF in patients with lichen has been described repeatedly but rates have varied.For example, the percentage of patients found to have fibrinogen deposits was 37-100% [32,33].The variation in the percentage of positive DIF results may be due to the different clinical patterns of lichen, of which there are six: reticular, papular, plaque-like, atrophic, erosive, and bullous.In one study, fibrinogen deposition was found at the BMZ in 73% of cases of reticular OLP and 57% of cases of plaque-like OLP [28].It may also depend on the location of the lesion.Bujeeb, who investigated 85 Thai patients with OLP, observed positive DIF in 94% of biopsies taken from the buccal mucosa, 64% of those taken from the gingiva, and 50% taken from the palate [18].In 2022, Mao et al. [33] described eight patterns of DIF in patients with OLP.Among 65 patients, 15 (35.7%) were quadruple (IgM, IgA, IgG, C3)-negative; two (4.8%) were quadruple-positive; ten (23.8%) were positive only for C3; seven (16.7%) were positive only for IgM, four (9.5%) were positive for IgM and IgA, three (7.2%) were positive for IgM and C3 and one (2.4%) was positive for IgM, IgG, and C3.Additionally, the two clinical subtypes did not differ from each other in terms of DIF.Similar analysis conducted by Bujeeb, including the location of deposits, revealed 12 DIF patterns, of which the most common were the following: a) fibrinogen deposits along the basement membrane zone (F BMZ) and IgM at colloid bodies (CBs) (35%); b) F BMZ (28%); c) F BMZ and IgM and IgA at CBs (12%); and d) F BMZ and IgM, IgA, and C3 at C (10%).Other types were sporadic (1.5% each).
In our study, none of the four patients with OLP confirmed in HP had a positive DIF result.By substituting the values known from the literature as the probability of a random event into a binomial distribution, we obtained, from 0.16, a success probability of 37%, and from 0.0016, a success probability of 80%.This shows that the percentage we obtained was significantly different from the results of at least those authors who obtained the highest percentages, but the size of only four people in this group does not allow us to draw radical conclusions.
The studies by other authors cited above, as well as our results, showed the limitations of using DIF for the diagnosis of lichen, probably due to the complex nature of this disease, having multiple phenotypes.Our attempt to create a multivariate classification model did not allow us to create a diagnostically useful tool based on the dataset we collected.Neural networks created based on our dataset achieved approximately 71% correct classifications in the validation sample.This value is too low to recommend the creation of a clinically useful tool on this basis but suggests that further work on optimizing the model on a larger number of cases and new variables acquired (in particular after including more patients with histopathologically confirmed lichen, which would allow the dependent variable to be changed) could potentially lead to such a tool.Perhaps the combination of data collected by several authors would allow the creation of such a tool.It should be borne in mind, however, that the neural networks that would be applied in clinical practice meet the criteria of a medical device and are subject to all the resulting regulations.Despite the difficulties involved in the development and eventual registration of effective neural networks, their use could help resolve OLP classification and diagnosis problems, but for this, it is necessary to find better predictors than those collected in our study.In particular, they may be useful when there are serious contraindications to biopsy.
Our study had several limitations.First, there were only four patients with lichen confirmed in HP.Given that there are six types of lichen, our sample did not reflect all phenotypes of this disease, especially as patients were not stratified by this type.Unlike the authors cited above, we did not collect detailed data on the location and appearance of deposits within the cell, which would have provided another variable that could have been included in the analysis.Furthermore, in many cases, biopsies were not performed or the results were non-diagnostic, which narrowed the dataset for analysis.Although histopathology is a gold standard, this examination can also yield unreliable results.Last, we used commercially available statistical software.The information provided by the software manufacturer regarding methods and results is limited and does not contain the full data that would be available if the network were created directly by a developer.Ideally, further work on the development of tools should involve an artificial intelligence specialist.This report can only serve as a preliminary screening analysis, indicating the possibility and necessity of further work in this direction.
Nevertheless, studies of this type are scarce in the literature, so despite these limitations, our work has made some contribution to the current state of knowledge by at least expanding the pool of patients described in this way.Furthermore, our analysis complements the 2022 publication by Mao et al. [33], whose authors assessed the correlation of IgG, IgM, IgA, C3, and C4 with each other and with RAE score and flow cytometry results but did not include fibrinogen.
Due to all the above limitations, our publication is very preliminary.The main takeaway from this work should be that such classification tools based on neural networks, increasingly used for decision making in medicine and veterinary science [34], can be particularly useful in clinical problems like the diagnosis of lichen planus.However, it is necessary to develop such tools using larger datasets than those we have gathered.We are open to collaborating with other researchers who possess similar data or can collect them.
The optimal direction for the further development of AI-based tools for OLP diagnosis seems to be the creation of models that integrate data from history, immunological tests, and imaging studies.Neural networks have shown high accuracy in classifying superficial images of lesions as OLP vs. non-OLP (88.18%) [35].In the research conducted by Keser et al. [36], a network trained on photographic images of the buccal mucosa, including 65 healthy samples and 72 samples with oral lichen planus lesions, achieved 100% accurate classifications, confirmed by experts in Oral Medicine and Maxillofacial Radiology.These results are very encouraging but the test and validation sample sizes in this study were very small (n = 7), resulting in a high risk of overfitting.Not only an increase in sample size but also the inclusion of ultrasound images could improve the quality of classification.The ultrasonography of oral mucosal lesions is a relatively new method that faces many barriers.The miniaturization of probes without the loss of image quality is a technical challenge; the probes used in the available literature are prototypes or instruments adapted from other areas of medicine [37].A systematic review published in 2023 showed that most of the few publications on the ultrasound of oral lesions concerned animals or healthy participants [37].There is currently only one publication on the use of ultra-high-frequency ultrasound in patients with OLP [38].Its authors presented ultrasound findings but, due to the small sample size, did not determine the sensitivity or specificity of this test in this indication.A similar paper by the same authors in patients with pemphigus vulgaris (PV) or mucous membrane pemphigoid (MMP) showed 75% sensitivity in the diagnosis of PV and 66.7% in the diagnosis of MMP [39].As can be seen, no diagnostic tool singly guarantees reliability in the diagnosis of OLP, so it is necessary to continue the development of multivariate classification models that integrate imaging, clinical, and immunological data.Ultimately, however, it must be underlined that any classification model that would enter clinical practice must be registered as a medical device in the EU [40]; such registration may be required also in the USA [41].

Conclusions
In conclusion, we confirmed that OLP poses diagnostic difficulties and that factors described in the literature as being associated with OLP are not univariate predictors.The use of an advanced method of analyzing the collected variables resulted in quite good classification, but the quality of this model is still insufficient for the use of such a tool in clinical practice, so the search for further diagnostic features is indispensable.

Diagnostics 2024 , 12 Figure 1 .
Figure 1.ROC curve or variable "lichen not excluded in histopathology".MLP: multilayer perceptron.The number represents the number of neurons in the input layer, in the hidden layer, and in the output layer.

Figure 1 .
Figure 1.ROC curve or variable "lichen not excluded in histopathology".MLP: multilayer perceptron.The number represents the number of neurons in the input layer, in the hidden layer, and in the output layer.

Table 1 .
Characteristics of the study group based on the relative best predictors for the dependent variable analyzed (lichen not excluded in HP).
p-values marked with red are statistically significant (p < 0.05).

Table 2 .
DIF results in patient subgroups distinguished by histopathology.

Table 3 .
Immunological patterns.+ means positive test result, − means negative test result.

Table 4 .
Relationship between DIF IgG, DIF IgM, DIF IgA, and DIF C3 results and DIF F1 and DIF F2 results.

Table 6 .
Automatic neural network classifiers for the variable 'lichen not excluded in HP'.